if (!requireNamespace("tidyverse", quietly = TRUE))
install.packages('tidyverse')
if (!requireNamespace("Seurat", quietly = TRUE))
install.packages('Seurat')
if (!requireNamespace("colorBlindness", quietly = TRUE))
install.packages('colorBlindness')
if (!requireNamespace("RColorBrewer", quietly = TRUE))
install.packages('RColorBrewer')
if (!requireNamespace("cluster", quietly = TRUE))
install.packages('cluster')6 - Clustering
Clustering
Introduction
This notebook picks up on the cell filtering from the QC notebook and circles back to some of the PCA content from notebook 4. Below, we cover essential Seurat functions as well as clustering metrics that give quantitative guidelines to how well the clustering process is performing.
Key Takeaways
one
two
three
Glossary
Resources
Load Libraries and Data
Libraries
suppressPackageStartupMessages({
library(dplyr)
library(Seurat)
library(tidyverse)
library(RColorBrewer)
library(colorBlindness)
library(DoubletFinder)
library(cluster)
})
set.seed(687)Load Data
# Load the Seurat object with doublet and batch information
se <- readRDS('../data/workshop-data-withQuality.rds')
se An object of class Seurat
33234 features across 59572 samples within 1 assay
Active assay: RNA (33234 features, 3000 variable features)
3 layers present: counts, data, scale.data
2 dimensional reductions calculated: pca, umap
# Set color palette
pal <- paletteMartin
names(pal) <- sort(unique(se$Celltype))We left off (last session) filtering out the poor quality data without regard for its distribution. Last week, we learned how principle components are calculated, what a latent space is, and defined a kNN (k-nearest neighbors) graph.
To review those:
We take a cell x gene matrix, normalize it, and reduce its dimensionality using PCA. We looked at an Elbow plot to determine the best number of PCs that accurately show the variance explained. After selecting the top 30 PCs, we generated a k-nearest neighbors (kNN) graph to represent the relationships between cells based on their gene expression profiles and ran FindClusters() to identify clusters of cells based on their neighborhood relationships. Kyle introduced Harmony as an integration technique to correct for batch effects and I just went over QC metrics including doublet detection that help us understand the quality and content of our data.
Today, we’re going to: 1. Look at the FindClusters() function, its default parameters, and the algorithm(s) it relies on. 2. Plot a UMAP
Overview
PCA - A dimensionality reduction technique that reduces the number of dimensions in a dataset while retaining as much variance as possible. The first principal component accounts for the most variance in the data, and each subsequent component accounts for less variance.
Latent Space - The latent space is the low-dimensional representation of the data that captures the most important features.
kNN Graph - A graph that represents the relationships between cells based on their gene expression profiles. Each cell is connected to its k (1, 20, 100) nearest neighbors in the graph.
Preprocessing
UMAP of Counts and QC Metrics
Clustering
Before clustering, these are the preprocessing steps we took:
# se %>%
# NormalizeData(verbose = FALSE) %>%
# FindVariableFeatures(
# method = "vst",
# nfeatures = 3000,
# verbose = FALSE) %>%
# ScaleData(verbose = FALSE) %>% # Scales data to have mean 0 and variance 1
# RunPCA(verbose = FALSE) Plots:
PCA = scree plot or biplot
cluster visualization or cluster heatmaps
PCA
Let’s start by visualizing PC1 and PC2 by Celltype.
# DimPlot allows us to color the latent space based on categorical values per cell, like cell type or which sample the cell is from
celltype_pca <- DimPlot(
se,
reduction = "pca",
group.by = 'Celltype'
)
celltype_pcaConstruct the kNN graph:
FindNeighbors() is the Seurat function that calculates the K-nearest neighbors of each cell in PCA space. The number of neighbors used for this calculation is controlled by the k.param parameter in the FindNeighbors function. The default value is 30. The function also calculates the distance between each cell and its neighbors. The function computes pairwaise distances between cells based on their gene expression common ex: euclidean distance, manhattan distance, Pearson correlation distance, cosine similarity. choice matters.
From BioStars FindNeighbors() is a function that is used to find the nearest neighbors of your single cell data point within a dataset. It works by calculating the neighborhood overlap (Jaccard index) between every cell and its k. param nearest neighbors. It’s often employed in various applications such as anomaly detection, and dimensionality reduction. The concept is that given a data point, you want to identify the closest data points to it based on some similarity metric, such as Euclidean distance or cosine similarity. This helps to identify similar points in the dataset, which can be useful for making predictions or understanding the distribution of the data.*
FindNeighbors()
The default values of FindNeighbors() are:
FindNeighbors( object, reduction = "pca", dims = 1:10, assay = NULL, features = NULL, k.param = 20, return.neighbor = FALSE, compute.SNN = !return.neighbor, prune.SNN = 1/15, nn.method = "annoy", n.trees = 50, annoy.metric = "euclidean", nn.eps = 0, verbose = TRUE, do.plot = FALSE, graph.name = NULL, l2.norm = FALSE, cache.index = FALSE, ... )
Let’s modify these to fit our analysis:
se_1 <- se %>%
FindNeighbors(
object = se,
reduction = "pca",
dims = 1:30,
k.param = 30,
verbose = FALSE
)se_1@graphs$RNA_nn
A Graph object containing 59572 cells
$RNA_snn
A Graph object containing 59572 cells
TKTK: is there any reason or way to look at the neighbors computed?
FindClusters
From BioStars: FindClusters() is a function used for clustering data points into groups or clusters based on their similarity. It uses a graph-based clustering approach and a Louvain algorithm. Clustering is an unsupervised learning technique where the algorithm groups similar cells together without any predefined labels. The goal is to find patterns and structure in your data. The number of clusters and the algorithm used can vary based on the problem and data characteristics. Common clustering algorithms include K-means, hierarchical clustering, and DBSCAN.
FindClusters() is used for identifying clusters of cells based on their neighborhood relationships typically obtained from PCA or t-SNE. It inputs the graph made from FindNeighbors and. outputs cluster assignments for each cell found at se@meta.data$seurat_clusters.
There are a couple of popular clustering algorithms. There is no one way to cluster as clustering is a means of looking at the data from different angles. The most popular clustering algortihms are the louvain algorithm, leiden algorithm, hierarchical clustering, and k-means clustering. Seruat uses the Leiden algorithm by default which is an improvement on the Louvain algorithm.
The resolution parameter controls the granularity of the clustering. Higher values of resolution will result in more clusters, while lower values will result in fewer clusters. The default value is 0.8.
In clustering, the goal is not to see how many clusters you can pull apart but it is an iterative process. especially in the first pass, you want to pull apart main cell groups such as epithelial cells and immune cells so you can further refine the clustering to get more granular cell types in the next iteration.
After clustering, we’ll review some cluster validation techniques to qualitatively and quantitatively check the quality of the clustering results.
FindClusters
The default values for FindClusters() are: FindClusters( object, graph.name = NULL, cluster.name = NULL, modularity.fxn = 1, initial.membership = NULL, node.sizes = NULL, resolution = 0.8, method = “matrix”, algorithm = 1, n.start = 10, n.iter = 10, random.seed = 0, group.singletons = TRUE, temp.file.location = NULL, edge.file.name = NULL, verbose = TRUE, … ) where when ‘algorithm’ = 1 original Louvain algorithm = 2 Louvain algorithm with multilevel refinement = 3 SLM (Smart Local Moving) algorithm = 4 Leiden algorithm See ?FindClusters for more information
We modify those to fit our object:
se_1 <- FindClusters(
object = se_1,
resolution = c(0.01, 0.05, 0.1, 0.15, 0.2,0.25)) Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 59572
Number of edges: 3484220
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9956
Number of communities: 5
Elapsed time: 19 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 59572
Number of edges: 3484220
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9852
Number of communities: 10
Elapsed time: 20 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 59572
Number of edges: 3484220
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9763
Number of communities: 13
Elapsed time: 21 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 59572
Number of edges: 3484220
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9700
Number of communities: 15
Elapsed time: 24 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 59572
Number of edges: 3484220
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9654
Number of communities: 17
Elapsed time: 22 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 59572
Number of edges: 3484220
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9613
Number of communities: 20
Elapsed time: 24 seconds
# Find clusters based on nearest neighbors at a resolution of 0.5# Results seen at RNA_snn_res in metadata
se_1@meta.data %>% select(contains("RNA_snn_res")) %>% colnames()[1] "RNA_snn_res.0.01" "RNA_snn_res.0.05" "RNA_snn_res.0.1" "RNA_snn_res.0.15" "RNA_snn_res.0.2" "RNA_snn_res.0.25"
Louvain vs Leiden
TKTK Define “community detection algorithm”. Shared nearest neighbors, why we use sNN. Define unsupervised clustering. we’re computing this over the dimensionality reduction space that we chose.
The Louvian algorithm was developed in 2008 and is a most popular community detection algorithm widely used in scRNA-seq. It recursively merges communities into a single node and executes the modularity clustering on the condensed graphs.(Zhang) Both Seurat and scanpy use Louvain as the default clustering algorithm.
Leiden Algorithm
The Leiden algorithm was published in 2020 as an improvement of the Louvain algorithm. Leiden creates clusters by taking into account the number of links between cells in a cluster versus the overall expected number of links in the dataset. It computes a clustering on a KNN graph obtained from the PC reduced expression space. It starts with an initial partition where each node from its own community. Next, the algorithm moves single nodes from one community to another to find a partition, which is then refined. Based on a refined partition an aggregate network is generated, which is again refined until no further improvements can be obtained, and the final partition is reached. (Single Cell Best Practices)
RunUMAP
Show that one weird figure figure as it goes from one resolution to the next. Show that hierarchal clustering takes 2 groups and splits each group into more rgoups. but with graph based, a higher resolution splits the whole dataset into more groups. k-means is greedy algo splits into equal size clusters. Look for proliferating cells. Look at MS4A1 for proliferating cells. For choosing 0.05 as the resolution, compare to 0.1 and note the big blobs splitting therefore keep them together for first round of clustering and get into the weeds in level 2 annotation.
- This function runs the UMAP algorithm on the PCA space. The dims parameter specifies which dimensions of the PCA space to use for the UMAP calculation. The default value is 1:30, which uses all dimensions. The n.components parameter specifies the number of dimensions in the UMAP embedding. The default value is 2.
se_1 <- se_1 %>%
RunUMAP(dims = 1:30, verbose = FALSE) # Run UMAP Visualize Clusters
DimPlot(
se_1,
group.by = c(
"RNA_snn_res.0.01", "RNA_snn_res.0.05",
"RNA_snn_res.0.1", "RNA_snn_res.0.15",
"RNA_snn_res.0.2", "RNA_snn_res.0.25"),
label = TRUE)Different Resolutions
0.01 vs 0.05 Resolution
DimPlot(
se_1,
group.by = c(
"Celltype","RNA_snn_res.0.05", "RNA_snn_res.0.1"),
label = TRUE)Let’s go with 0.05 and perform cluster validation.
Cluster Metrics - most important part.
Cluster Diversity
1 - Number of Cells in each Cluster
seurat_clusters <- "RNA_snn_res.0.05"
diversityPerGroup <- function(df, species, group, diversity_metric = 'simpson') {
# Convert strings to symbols for curly-curly operator
species_sym <- rlang::sym(species)
group_sym <- rlang::sym(group)
# Count groups per species directly using curly-curly
tierNcount <- df %>%
group_by({{species_sym}}) %>%
count({{group_sym}}, name = "n") %>% ungroup
# Pivot table to allow vegan::diversity call
tierNwide <- tierNcount %>%
pivot_wider(names_from = {{group_sym}}, values_from = n, values_fill = list(n = 0))
# Use rownames of species for the diversity function, which needs a dataframe
tierNwide_df <- as.data.frame(tierNwide)
# species column must be first
tierNwide_df <- tierNwide_df %>% select({{species}}, everything())
rownames(tierNwide_df) <- tierNwide_df[, 1]
tierNwide_df <- tierNwide_df[, -1]
# Calculate diversity
diversity_values <- vegan::diversity(tierNwide_df, index = diversity_metric)
# Prepare result as a dataframe
result <- data.frame(
species = rownames(tierNwide_df),
diversity = diversity_values,
row.names = NULL
)
# Rename diversity column
names(result)[1] <- species
names(result)[2] <- sprintf('%s_diversity', group)
return(result)
}
# Calculate simpson's diversity per cluster
clusterMetrics <- diversityPerGroup(se_1@meta.data,
species = 'RNA_snn_res.0.05',
group = 'Sample ID',
diversity_metric = 'simpson')
# Calculate number of cells per cluster and join to metrics table
clusterMetrics <- clusterMetrics %>% left_join(se_1@meta.data %>% count(RNA_snn_res.0.05))
# clusterMetrics
# p1 <- ggplot(clusterMetrics, aes(x = Celltype, y = n)) +
# geom_bar(stat = "identity", fill = 'black') +
# scale_y_log10() +
# labs(x = "Cell Type", y = "Cell Number (log scale)") +
# theme_minimal() +
# theme(axis.text.x = element_text(angle = 45, hjust = 1))library(viridis)
seurat_clusters <- "RNA_snn_res.0.05"
clusterMetrics$seurat_clusters <- as.numeric(clusterMetrics$RNA_snn_res.0.05)
lollipop <- ggplot(clusterMetrics, aes(x = seurat_clusters, y = n)) +
geom_segment(aes(x = seurat_clusters, xend = seurat_clusters, y = 0, yend = n),
size = 1.5, color = 'grey80') + # Draw lines for lollipops
geom_point(aes(colour = `Sample ID_diversity`), size = 5) + # Add colored lollipop 'heads', coloring by 'Sample ID_diversity'
scale_y_log10() +
scale_x_continuous(breaks = seq(0,20)) +
scale_colour_viridis(option = "C", name = "Sample ID Diversity", direction = 1, limits = c(0,1)) + # Colorblind-friendly, vibrant color palette
theme_minimal(base_size = 10) +
theme(legend.position = "bottom",
axis.text = element_text(size = 12),
axis.title = element_text(size = 14),
title = element_text(size = 16)) +
labs(x = "Seurat Clusters",
y = "cluster size (log-scaled)",
title = "Cluster Diversity Metrics") +
guides(colour = guide_colourbar(title.position = "top", title.hjust = 0.5))
lollipop2 - Which samples are in each cluster
# Plot sample distribution per cluster
# Load necessary packages
library(ggplot2)
library(dplyr)
library(tidyr)
# Assuming se_1@meta.data has columns 'seurat_clusters' and 'Sample ID'
# Prepare the data for plotting
plot_sample <- se_1@meta.data %>%
count(RNA_snn_res.0.05, `Sample ID`) %>%
group_by(RNA_snn_res.0.05) %>%
mutate(proportion = n / sum(n)) %>%
ungroup()
# Create a stacked bar plot
ggplot(plot_sample, aes(x = factor(RNA_snn_res.0.05), y = proportion, fill = `Sample ID`)) +
geom_bar(stat = "identity", position = "stack") +
labs(x = "Seurat Clusters", y = "Proportion", fill = "Sample ID",
title = "Distribution of Sample IDs Across Clusters") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_fill_manual(values = pal)# TD: use Marc's color scheme for donors- factor analysis# Prepare the data for plotting
plot_batch <- se_1@meta.data %>%
count(RNA_snn_res.0.05, batch) %>%
group_by(RNA_snn_res.0.05) %>%
mutate(proportion = n / sum(n)) %>%
ungroup()
# Create a stacked bar plot
ggplot(plot_batch, aes(x = factor(RNA_snn_res.0.05), y = proportion, fill = batch)) +
geom_bar(stat = "identity", position = "stack") +
labs(x = "Seurat Clusters", y = "Proportion", fill = "Batch",
title = "Distribution of Batches Across Clusters") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))3 - Stacked Bar Plot of Cluster Diversity
p2 <- ggplot(clusterMetrics, aes(y=as.character(seurat_clusters), fill=`Sample ID_diversity`, x = 1, label = n)) +
geom_tile(colour = "white") +
geom_text(nudge_x = 1.5, size = 3) +
geom_text(aes(label = signif(`Sample ID_diversity`, 2)),size = 3) +
scale_fill_distiller(palette = "Spectral", limits = c(0,1)) + theme_classic() +
coord_fixed(ratio = 0.25) +
expand_limits(x = c(0.5,3)) +
labs(x = "Diversity Size") +
theme(axis.text.y = element_text(hjust = 1, vjust = 0.5, size = 12),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_text(size = 15),
legend.key.size = unit(1, 'cm'),
legend.title = element_text(size=10),
legend.text = element_text(size=10)
)
p2Silhouette Analysis
As covered in the last workshop, silhouette analysis is a way to measure how similar an object is to its own cluster compared to other clusters. The silhouette value ranges from -1 to 1, where a high value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters.
seurat_clusters <- se_1@meta.data$RNA_snn_res.0.05
pca_embeddings <- Embeddings(se, reduction = 'pca')
# Calculate silhouette widths
sil_scores <- silhouette(x = as.numeric(seurat_clusters), dist = dist(pca_embeddings))
# Extract silhouette scores
silhouette_data <- as.data.frame(sil_scores)
# Recover cell type names
silhouette_data$seurat_clusters <- as.character(seurat_clusters)
row.names(silhouette_data) <- row.names(pca_embeddings)
silhouette_arranged <- silhouette_data %>%
group_by(seurat_clusters) %>%
arrange(-sil_width)overall_average <- silhouette_arranged %>%
ungroup %>%
summarize(ave = as.numeric(mean(sil_width))) %>%
pull(ave)
full_plot <- ggplot(silhouette_arranged,
aes(x = sil_width,
y = seurat_clusters,
fill = seurat_clusters,
group = seurat_clusters)) +
geom_bar(stat = "identity", position = 'dodge2') +
geom_vline(xintercept = overall_average,
color = 'red',
linetype = 'longdash') +
theme_minimal() +
labs(title = "Silhouette Analysis",
y = "Cluster",
x = "Silhouette width",
fill = "Cluster") +
theme(axis.text.y = element_text(hjust = 1, vjust = 0.5, size = 20),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_text(size = 20),
legend.position = "None")
full_plotLook at silhouette scores on a UMAP
d5 <- DimPlot(se_1,
reduction='umap',
group.by='RNA_snn_res.0.05')
se_1$CellID <- row.names(se_1@meta.data)
sil_ids <- silhouette_data %>% rownames_to_column('CellID') %>% left_join(se_1@meta.data, by='CellID')
head(silhouette_data) cluster neighbor sil_width seurat_clusters
AAACCCAAGAATGTTG-12 1 3 0.4380270 0
AAACCCAAGCATTGTC-19 1 3 0.3523827 0
AAACCCAAGCTACGTT-6 1 6 0.3017238 0
AAACCCAAGGCCGCTT-3 5 7 0.2647330 4
AAACCCAAGGCCTGCT-12 1 3 0.2760281 0
AAACCCAAGGGCAATC-1 3 1 0.4495139 2
head(se_1@meta.data) orig.ident nCount_RNA nFeature_RNA Sample ID Disease group Comorbidity Hospital day WBC per microL Neutrophil per microL (%) Lymphocyte per microL (%) Monocyte prt microL (%) C-reactive protein (mg per dL) Chest X-ray Treatment Respiratory rate (BPM) O2 saturation O2 supplement Temperature Systolic BP Heart rate (BPM) Consciousness NEWS score Severity Celltype Number of UMI Number of Gene Percentage of mitochondrial gene Age tissue_ontology_term_id assay_ontology_term_id disease_ontology_term_id cell_type_ontology_term_id self_reported_ethnicity_ontology_term_id development_stage_ontology_term_id sex_ontology_term_id is_primary_data organism_ontology_term_id donor_id suspension_type tissue_type cell_type assay disease organism sex tissue self_reported_ethnicity development_stage observation_joinid perc.mt perc.ribo perc.hb Experimental batch batch pANN
AAACCCAAGAATGTTG-12 SeuratProject 5322 1910 nCoV 6 mild COVID-19 diabetes mellitus, dyslipidemia 3 5590 3119 (55.8) 1834 (32.8) 637 (11.4) 0.59 pneumonia lopinavir and ritonavir, hydroxychloroquine 18 97 Room air 37.3 92 80 Alert 2 mild NK cell 5322 1910 3.100338 38 UBERON:0000178 EFO:0009922 MONDO:0100096 CL:0000623 unknown HsapDv:0000132 PATO:0000383 TRUE NCBITaxon:9606 nCoV 6 cell tissue natural killer cell 10x 3' v3 COVID-19 Homo sapiens female blood unknown 38-year-old human stage 1s~)u&E!VB 3.100338 23.22435 0.01878993 3 3 0.21890415
AAACCCAAGCATTGTC-19 SeuratProject 7644 2292 Normal 4 Healthy Donor not applicable not applicable not applicable not applicable not applicable not applicable not applicable not applicable not applicable not applicable not applicable not applicable not applicable not applicable not applicable not applicable not applicable not applicable CD4, EM-like 7646 2294 7.847240 63 UBERON:0000178 EFO:0009922 PATO:0000461 CL:0001044 unknown HsapDv:0000157 PATO:0000384 TRUE NCBITaxon:9606 Normal 4 cell tissue effector CD4-positive, alpha-beta T cell 10x 3' v3 normal Homo sapiens male blood unknown 63-year-old human stage T<AIMsSxFw 7.849294 29.72266 0.02616431 4 4 0.05454545
AAACCCAAGCTACGTT-6 SeuratProject 5406 1945 Flu 3 severe influenza kidney transplantation status due to ESRD, atrial fibrillation, stroke, hypertension, diabetes, history of pulmonary tuberculosis nan 15400 12813 (83.2) 755 (4.9) 1725 (11.2) 6.65 total collapse LLL, pleural effusion, pericardial effusion nan nan not applicable not applicable not applicable not applicable not applicable not applicable not applicable not applicable CD8, non-EM-like 5409 1948 9.521168 70 UBERON:0000178 EFO:0009922 MONDO:0005812 CL:0000625 unknown HsapDv:0000164 PATO:0000384 TRUE NCBITaxon:9606 Flu 3 cell tissue CD8-positive, alpha-beta T cell 10x 3' v3 influenza Homo sapiens male blood unknown 70-year-old human stage u8rl}V-`Kn 9.526452 19.18239 0.00000000 2 2 0.03508772
AAACCCAAGGCCGCTT-3 SeuratProject 3981 1583 Flu 1 severe influenza hypertension, rheumatoid arthritis nan 12900 10655 (82.6) 1174 (9.1) 1032 (8.0) 9.39 peribronchial opacity in both lungs, pleural effusion nan nan not applicable not applicable not applicable not applicable not applicable not applicable not applicable not applicable classical Monocyte 3982 1584 6.428930 68 UBERON:0000178 EFO:0009922 MONDO:0005812 CL:0000860 unknown HsapDv:0000162 PATO:0000384 TRUE NCBITaxon:9606 Flu 1 cell tissue classical monocyte 10x 3' v3 influenza Homo sapiens male blood unknown 68-year-old human stage _)<TM?x-7b 6.430545 11.93168 0.07535795 1 1 0.16921606
AAACCCAAGGCCTGCT-12 SeuratProject 3360 1537 nCoV 6 mild COVID-19 diabetes mellitus, dyslipidemia 3 5590 3119 (55.8) 1834 (32.8) 637 (11.4) 0.59 pneumonia lopinavir and ritonavir, hydroxychloroquine 18 97 Room air 37.3 92 80 Alert 2 mild NK cell 3361 1538 6.486165 38 UBERON:0000178 EFO:0009922 MONDO:0100096 CL:0000623 unknown HsapDv:0000132 PATO:0000383 TRUE NCBITaxon:9606 nCoV 6 cell tissue natural killer cell 10x 3' v3 COVID-19 Homo sapiens female blood unknown 38-year-old human stage V$~fT4_K{L 6.488095 12.79762 0.00000000 3 3 0.20957206
AAACCCAAGGGCAATC-1 SeuratProject 15674 3308 nCoV 1 severe COVID-19 no comorbidity 16 21540 19235 (89.3) 1055 (4.9) 1271 (5.9) 7.58 pneumonia lopinavir and ritonavir, hydroxychloroquine, nafamostat 24 90 ECMO + MV 37.6 92 122 Unresponsive 14 severe B cell, IgG+ 15679 3311 7.296384 63 UBERON:0000178 EFO:0009922 MONDO:0100096 CL:0000979 unknown HsapDv:0000157 PATO:0000384 TRUE NCBITaxon:9606 nCoV 1 cell tissue IgG memory B cell 10x 3' v3 COVID-19 Homo sapiens male blood unknown 63-year-old human stage W+utHdOui6 7.298711 38.08855 0.01913998 1 1 0.29302103
DF_class doublet_run pK pN quality RNA_snn_res.0.01 RNA_snn_res.0.05 RNA_snn_res.0.1 RNA_snn_res.0.15 RNA_snn_res.0.2 RNA_snn_res.0.25 seurat_clusters CellID
AAACCCAAGAATGTTG-12 Singlet 4142 0.3 0.25 good quality 0 0 2 1 0 0 0 AAACCCAAGAATGTTG-12
AAACCCAAGCATTGTC-19 Singlet 4185 0.005 0.25 good quality 0 0 0 4 4 4 4 AAACCCAAGCATTGTC-19
AAACCCAAGCTACGTT-6 Singlet 4120 0.005 0.25 good quality 0 0 0 2 1 1 1 AAACCCAAGCTACGTT-6
AAACCCAAGGCCGCTT-3 Singlet 4180 0.1 0.25 good quality 4 4 5 6 7 6 6 AAACCCAAGGCCGCTT-3
AAACCCAAGGCCTGCT-12 Singlet 4142 0.3 0.25 good quality 0 0 2 1 0 0 0 AAACCCAAGGCCTGCT-12
AAACCCAAGGGCAATC-1 Doublet 4180 0.1 0.25 good quality 2 2 3 3 3 3 3 AAACCCAAGGGCAATC-1
se_1 <- AddMetaData(se_1, sil_ids)
FeaturePlot(se_1, feature = "sil_width") + ggtitle('Silhouette width') + scale_color_viridis_c(limits = c(-1,1), option = "magma") | d5# TD: change color palettelibrary(ggplot2)
# Convert the 'batch' variable to a character
se@meta.data$batch <- as.character(se@meta.data$batch)
d1 <- DimPlot(se,
reduction='umap',
group.by='batch')
d2 <- DimPlot(se,
reduction='umap',
group.by='DF_class')
d3 <- DimPlot(se,
reduction='umap',
group.by='Sample ID')
d4 <- DimPlot(se,
reduction='umap',
group.by='Celltype')
d1 | d2d3 | d4Session Info
sessionInfo()R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.5
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] viridis_0.6.5 viridisLite_0.4.2 cluster_2.1.6 DoubletFinder_2.0.4 colorBlindness_0.1.9 RColorBrewer_1.1-3 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1 purrr_1.0.2 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.1 tidyverse_2.0.0 Seurat_5.0.3 SeuratObject_5.0.2 sp_2.1-4 dplyr_1.1.4
loaded via a namespace (and not attached):
[1] rstudioapi_0.16.0 jsonlite_1.8.8 magrittr_2.0.3 spatstat.utils_3.0-4 farver_2.1.2 rmarkdown_2.27 vctrs_0.6.5 ROCR_1.0-11 spatstat.explore_3.2-7 htmltools_0.5.8.1 gridGraphics_0.5-1 sctransform_0.4.1 parallelly_1.37.1 KernSmooth_2.23-22 htmlwidgets_1.6.4 ica_1.0-3 plyr_1.8.9 plotly_4.10.4 zoo_1.8-12 igraph_2.0.3 mime_0.12 lifecycle_1.0.4 pkgconfig_2.0.3 Matrix_1.6-5 R6_2.5.1 fastmap_1.2.0 fitdistrplus_1.1-11 future_1.33.2 shiny_1.8.1.1 digest_0.6.35 colorspace_2.1-0 patchwork_1.2.0 tensor_1.5 RSpectra_0.16-1 irlba_2.3.5.1 vegan_2.6-6 labeling_0.4.3 progressr_0.14.0 fansi_1.0.6 spatstat.sparse_3.0-3 timechange_0.3.0 mgcv_1.9-1 httr_1.4.7 polyclip_1.10-6 abind_1.4-5 compiler_4.3.2 withr_3.0.0 fastDummies_1.7.3 MASS_7.3-60.0.1 permute_0.9-7 tools_4.3.2
[52] lmtest_0.9-40 httpuv_1.6.15 future.apply_1.11.2 goftest_1.2-3 glue_1.7.0 nlme_3.1-164 promises_1.3.0 grid_4.3.2 Rtsne_0.17 reshape2_1.4.4 generics_0.1.3 gtable_0.3.5 spatstat.data_3.0-4 tzdb_0.4.0 data.table_1.15.4 hms_1.1.3 utf8_1.2.4 spatstat.geom_3.2-9 RcppAnnoy_0.0.22 ggrepel_0.9.5 RANN_2.6.1 pillar_1.9.0 spam_2.10-0 RcppHNSW_0.6.0 later_1.3.2 splines_4.3.2 lattice_0.22-6 survival_3.6-4 deldir_2.0-4 tidyselect_1.2.1 miniUI_0.1.1.1 pbapply_1.7-2 knitr_1.45 gridExtra_2.3 scattermore_1.2 xfun_0.44 matrixStats_1.3.0 stringi_1.8.4 lazyeval_0.2.2 yaml_2.3.8 evaluate_0.23 codetools_0.2-20 cli_3.6.2 uwot_0.1.16 xtable_1.8-4 reticulate_1.36.1 munsell_0.5.1 Rcpp_1.0.12 globals_0.16.3 spatstat.random_3.2-3 png_0.1-8
[103] parallel_4.3.2 dotCall64_1.1-1 listenv_0.9.1 scales_1.3.0 ggridges_0.5.6 leiden_0.4.3.1 rlang_1.1.3 cowplot_1.1.3